Non-Euclidean norms and data normalisation

نویسندگان

  • Kevin Doherty
  • Rod Adams
  • Neil Davey
چکیده

In this paper, we empirically examine the use of a range of Minkowski norms for the clustering of real world data. We also investigate whether normalisation of the data prior to clustering affects the quality of the result. In a nearest neighbour search on raw real world data sets, fractional norms outperform the Euclidean and higher-order norms. However, when the data are normalised, the results of the nearest neighbour search with the fractional norms are very similar to the results obtained with the Euclidean norm. We show with the classic statistical technique, K-means clustering, and with the Neural Gas artificial neural network that on raw real world data the use of a fractional norm does not improve the recovery of cluster structure. However, the normalisation of the data results in improved recovery accuracy and minimises the effect of the differing norms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised learning with normalised data and non-Euclidean norms

The measurement of distance is one of the key steps in the unsupervised learning process, as it is through these distance measurements that patterns and correlations are discovered. We examined the characteristics of both non-Euclidean norms and data normalisation within the unsupervised learning environment. We empirically assessed the performance of the K-means, Neural Gas, Growing Neural Gas...

متن کامل

Non-Euclidean c-means clustering algorithms

This paper introduces non-Euclidean c-means clustering algorithms. These algorithms rely on weighted norms to measure the distance between the feature vectors and the prototypes that represent the clusters. The proposed algorithms are developed by solving a constrained minimization problem in an iterative fashion. The norm weights are determined from the data in an attempt to produce partitions...

متن کامل

Dimensionality Reduction with Unsupervised Feature Selection and Applying Non-Euclidean Norms for Classification Accuracy

This paper presents a two-phase scheme to select reduced number of features from a dataset using Genetic Algorithm (GA) and testing the classification accuracy (CA) of the dataset with the reduced feature set. In the first phase of the proposed work, an unsupervised approach to select a subset of features is applied. GA is used to select stochastically reduced number of features with Sammon Err...

متن کامل

Spatial Analysis in curved spaces with Non-Euclidean Geometry

The ultimate goal of spatial information, both as part of technology and as science, is to answer questions and issues related to space, place, and location. Therefore, geometry is widely used for description, storage, and analysis. Undoubtedly, one of the most essential features of spatial information is geometric features, and one of the most obvious types of analysis is the geometric type an...

متن کامل

Expectation Values and Variance Based on Lp-Norms

This analysis introduces a generalization of the basic statistical concepts of expectation values and variance for non-Euclidean metrics induced by L-norms. The non-Euclidean L means are defined by exploiting the fundamental property of minimizing the L deviations that compose the L variance. These L expectation values embody a generic formal scheme of means characterization. Having the p-norm ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004